Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC
نویسندگان
چکیده
Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and single-threaded jobs. We report on tests of checkpoint-restart technology using CMS software, Geant4-MT (multi-threaded Geant4), and the DMTCP (Distributed Multithreaded Checkpointing) package. We analyze both singleand multi-threaded applications and test on both standard Intel x86 architectures and on Intel MIC. The tests with multi-threaded applications on Intel MIC are used to consider scalability and performance. These are considered an indicator of what the future may hold for many-core computing.
منابع مشابه
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi...
متن کاملInvestigation of Portable Event Based Monte Carlo Transport Using the Nvidia Thrust Library
Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. The Trinity machine to become available at Los Alamos National Laboratory in 2016 will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. The Sierra machine to be available at Lawrence Live...
متن کاملMany-core applications to online track reconstruction in HEP experiments
Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scal...
متن کاملA Generic Checkpoint-Restart Mechanism for Virtual Machines
It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual ma...
متن کاملA quantitative Comparison of Checkpoint with Restart and Replication in Volatile Environments
Volatile computing environments such as desktop grids differs from traditional systems in the high volatility of compute nodes in both reachability and availability of compute resource. As a result, different fault tolerant techniques are required to ensure efficient execution of parallel jobs. This technical report summarizes failure and availability patterns of distributed computing systems; ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1311.0272 شماره
صفحات -
تاریخ انتشار 2013